Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Velero analyzer #1366

Merged
merged 9 commits into from
Nov 3, 2023
Merged

Velero analyzer #1366

merged 9 commits into from
Nov 3, 2023

Conversation

arcolife
Copy link
Contributor

@arcolife arcolife commented Oct 11, 2023

Description, Motivation and Context

Velero analyzer (collector)

Fixes: #806
discards the need for collector: #1365

Checklist

  • New and existing tests pass locally with introduced changes.
  • Tests for the changes have been added (for bug fixes / features)
  • The commit message(s) are informative and highlight any breaking changes
  • Any documentation required has been added/updated. add velero analyzer docs troubleshoot.sh#528

Does this PR introduce a breaking change?

  • Yes
  • No

@arcolife
Copy link
Contributor Author

arcolife commented Oct 11, 2023

WIP status

Check FAIL
Title: Backup observability-backup-failed
Message: Backup observability-backup-failed phase is PartiallyFailed

------------
Check PASS
Title: Velero Backups count
Message: Found 3 backups

------------
Check PASS
Title: At least 1 Velero Backup Repository configured
Message: Found 2 configured backup repositories

------------

@arcolife arcolife force-pushed the velero-analyzer branch 4 times, most recently from 822dc53 to 39bb9fd Compare October 12, 2023 19:05
@arcolife
Copy link
Contributor Author

status:

Check PASS
Title: At least 1 Velero Backup Repository configured
Message: Found 2 backup repositories configured and 2 Ready

------------
Check FAIL
Title: Backup observability-backup-failed
Message: Backup observability-backup-failed phase is PartiallyFailed

------------
Check PASS
Title: Velero Backups count
Message: Found 3 backups

------------
Check WARN
Title: Backup Storage Location default-ue1-manual
Message: Backup Storage Location [default-ue1-manual] is in phase Unavailable

------------
Check WARN
Title: Backup Storage Location default-uw2
Message: Backup Storage Location [default-uw2] is in phase Unavailable

------------
Check PASS
Title: At least 1 Velero Backup Storage Location configured
Message: Found 5 backup storage locations configured and 3 Available

------------
Check PASS
Title: Pod Volume Backups count
Message: Found 1 pod volume backups

------------
Check PASS
Title: Velero Restores count
Message: Found 1 restores

------------
Check PASS
Title: Velero Volume Snapshot Locations count
Message: Found 1 volume snapshot locations

------------

@arcolife
Copy link
Contributor Author

Screenshot 2023-10-13 at 2 25 22 AM

@arcolife arcolife marked this pull request as ready for review October 12, 2023 20:57
@arcolife arcolife requested a review from a team as a code owner October 12, 2023 20:57
banjoh
banjoh previously requested changes Oct 16, 2023
Copy link
Member

@banjoh banjoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, lets try to see if this analyser can achieve the same goal using what both the cluster-resources and logs collectors collect. If its possible, we will not need a separate velero collector implementation

Here is the spec we looked at

apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
  name: support-bundle
spec:
  collectors:
    - clusterResources: {} # Usually omitted since its added by default
    - logs:
        namespace: velero
  analysers:
    - velero: {}

@arcolife arcolife force-pushed the velero-analyzer branch 2 times, most recently from f9d80a4 to b6e8c3c Compare October 19, 2023 11:39
@arcolife arcolife requested a review from banjoh October 19, 2023 11:43
@arcolife
Copy link
Contributor Author

arcolife commented Oct 19, 2023

adding tests but this runs without the collector now.

@banjoh Let's discuss your concerns around client libs, supporting old velero etc

go mod tidy added a velero 1.12 but can downgrade this to 1.11 for starters until confirmed internally.

Copy link
Member

@banjoh banjoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analyser has taken really good shape.

As for using velero client libraries, I see no harm in doing so. My concern was in trying to have different client library versions so as to handle deprecated GVKs (velerov1.BackupRepositoryTypeResti). I do not see any reason for taking this approach. We can analyse the k8s objects as they are using k8s unstructured type. Here is an example of me using this type instead of the types implemented by velero client library.

We will however need to see how using a different version of velero affects other projects that import troubleshoot as a dependency. go tooling should handle this dependency issue pretty well, but lets test it with some sample "hello world" application.

pkg/analyze/velero.go Outdated Show resolved Hide resolved
@arcolife arcolife force-pushed the velero-analyzer branch 6 times, most recently from 898ac55 to 7a2a80a Compare October 31, 2023 09:31
Copy link
Member

@banjoh banjoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good apart from the Namespace comment. I think it would need to be changed (removed?) since it contributes to the collector spec which, if we need to remove, might require following a deprecation process

pkg/analyze/velero.go Outdated Show resolved Hide resolved
pkg/analyze/velero.go Outdated Show resolved Hide resolved
pkg/analyze/velero.go Outdated Show resolved Hide resolved
banjoh
banjoh previously approved these changes Nov 3, 2023
Copy link
Member

@banjoh banjoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I added some code suggestions

pkg/analyze/velero.go Outdated Show resolved Hide resolved
@arcolife arcolife force-pushed the velero-analyzer branch 2 times, most recently from 532dbef to 7ab1ee7 Compare November 3, 2023 11:40
@arcolife arcolife requested review from banjoh and DexterYan November 3, 2023 11:41
  * updated schemas
  * velero analyzer without collector

Signed-off-by: Archit Sharma <[email protected]>
Signed-off-by: Archit Sharma <[email protected]>
Signed-off-by: Archit Sharma <[email protected]>
Signed-off-by: Archit Sharma <[email protected]>
..better than using unstructured and be the deer in headlights

Signed-off-by: Archit Sharma <[email protected]>
Signed-off-by: Archit Sharma <[email protected]>
@arcolife
Copy link
Contributor Author

arcolife commented Nov 3, 2023

# go test -v -timeout 30s -run '^(TestAnalyzeVelero.*)$' github.com/replicatedhq/troubleshoot/pkg/analyze

=== RUN   TestAnalyzeVelero_BackupRepositories
=== RUN   TestAnalyzeVelero_BackupRepositories/no_backup_repositories
=== RUN   TestAnalyzeVelero_BackupRepositories/1_backup_repository_and_1_Ready
=== RUN   TestAnalyzeVelero_BackupRepositories/2_backup_repositories_and_1_Ready
=== RUN   TestAnalyzeVelero_BackupRepositories/1_backup_repository_and_none_Ready
--- PASS: TestAnalyzeVelero_BackupRepositories (0.00s)
    --- PASS: TestAnalyzeVelero_BackupRepositories/no_backup_repositories (0.00s)
    --- PASS: TestAnalyzeVelero_BackupRepositories/1_backup_repository_and_1_Ready (0.00s)
    --- PASS: TestAnalyzeVelero_BackupRepositories/2_backup_repositories_and_1_Ready (0.00s)
    --- PASS: TestAnalyzeVelero_BackupRepositories/1_backup_repository_and_none_Ready (0.00s)
=== RUN   TestAnalyzeVelero_ResticRepositories
=== RUN   TestAnalyzeVelero_ResticRepositories/no_restic_repositories
=== RUN   TestAnalyzeVelero_ResticRepositories/1_restic_repository_and_1_Ready
--- PASS: TestAnalyzeVelero_ResticRepositories (0.00s)
    --- PASS: TestAnalyzeVelero_ResticRepositories/no_restic_repositories (0.00s)
    --- PASS: TestAnalyzeVelero_ResticRepositories/1_restic_repository_and_1_Ready (0.00s)
=== RUN   TestAnalyzeVelero_Backups
=== RUN   TestAnalyzeVelero_Backups/no_backups
=== RUN   TestAnalyzeVelero_Backups/1_backup_and_1_Completed
=== RUN   TestAnalyzeVelero_Backups/1_backup_and_1_Failed
--- PASS: TestAnalyzeVelero_Backups (0.00s)
    --- PASS: TestAnalyzeVelero_Backups/no_backups (0.00s)
    --- PASS: TestAnalyzeVelero_Backups/1_backup_and_1_Completed (0.00s)
    --- PASS: TestAnalyzeVelero_Backups/1_backup_and_1_Failed (0.00s)
=== RUN   TestAnalyzeVelero_BackupStorageLocations
=== RUN   TestAnalyzeVelero_BackupStorageLocations/no_backup_storage_locations
=== RUN   TestAnalyzeVelero_BackupStorageLocations/1_backup_storage_location_and_1_Available
=== RUN   TestAnalyzeVelero_BackupStorageLocations/1_backup_storage_location_and_1_Unavailable
--- PASS: TestAnalyzeVelero_BackupStorageLocations (0.00s)
    --- PASS: TestAnalyzeVelero_BackupStorageLocations/no_backup_storage_locations (0.00s)
    --- PASS: TestAnalyzeVelero_BackupStorageLocations/1_backup_storage_location_and_1_Available (0.00s)
    --- PASS: TestAnalyzeVelero_BackupStorageLocations/1_backup_storage_location_and_1_Unavailable (0.00s)
=== RUN   TestAnalyzeVelero_DeleteBackupRequests
=== RUN   TestAnalyzeVelero_DeleteBackupRequests/no_backup_delete_requests
=== RUN   TestAnalyzeVelero_DeleteBackupRequests/backup_delete_requests_completed
=== RUN   TestAnalyzeVelero_DeleteBackupRequests/backup_delete_requests_summarize_in_progress
--- PASS: TestAnalyzeVelero_DeleteBackupRequests (0.00s)
    --- PASS: TestAnalyzeVelero_DeleteBackupRequests/no_backup_delete_requests (0.00s)
    --- PASS: TestAnalyzeVelero_DeleteBackupRequests/backup_delete_requests_completed (0.00s)
    --- PASS: TestAnalyzeVelero_DeleteBackupRequests/backup_delete_requests_summarize_in_progress (0.00s)
=== RUN   TestAnalyzeVelero_PodVolumeBackups
=== RUN   TestAnalyzeVelero_PodVolumeBackups/no_pod_volume_backups
=== RUN   TestAnalyzeVelero_PodVolumeBackups/pod_volume_backups
--- PASS: TestAnalyzeVelero_PodVolumeBackups (0.00s)
    --- PASS: TestAnalyzeVelero_PodVolumeBackups/no_pod_volume_backups (0.00s)
    --- PASS: TestAnalyzeVelero_PodVolumeBackups/pod_volume_backups (0.00s)
=== RUN   TestAnalyzeVelero_PodVolumeRestores
=== RUN   TestAnalyzeVelero_PodVolumeRestores/no_pod_volume_restores
=== RUN   TestAnalyzeVelero_PodVolumeRestores/pod_volume_restores_-_no_failures
=== RUN   TestAnalyzeVelero_PodVolumeRestores/pod_volume_restores_-_failures
--- PASS: TestAnalyzeVelero_PodVolumeRestores (0.00s)
    --- PASS: TestAnalyzeVelero_PodVolumeRestores/no_pod_volume_restores (0.00s)
    --- PASS: TestAnalyzeVelero_PodVolumeRestores/pod_volume_restores_-_no_failures (0.00s)
    --- PASS: TestAnalyzeVelero_PodVolumeRestores/pod_volume_restores_-_failures (0.00s)
=== RUN   TestAnalyzeVelero_Restores
=== RUN   TestAnalyzeVelero_Restores/no_restores
=== RUN   TestAnalyzeVelero_Restores/restores_completed
=== RUN   TestAnalyzeVelero_Restores/restores_-_failures
--- PASS: TestAnalyzeVelero_Restores (0.00s)
    --- PASS: TestAnalyzeVelero_Restores/no_restores (0.00s)
    --- PASS: TestAnalyzeVelero_Restores/restores_completed (0.00s)
    --- PASS: TestAnalyzeVelero_Restores/restores_-_failures (0.00s)
=== RUN   TestAnalyzeVelero_Schedules
=== RUN   TestAnalyzeVelero_Schedules/no_schedules
=== RUN   TestAnalyzeVelero_Schedules/schedules_configured
=== RUN   TestAnalyzeVelero_Schedules/schedules_-_failures
--- PASS: TestAnalyzeVelero_Schedules (0.00s)
    --- PASS: TestAnalyzeVelero_Schedules/no_schedules (0.00s)
    --- PASS: TestAnalyzeVelero_Schedules/schedules_configured (0.00s)
    --- PASS: TestAnalyzeVelero_Schedules/schedules_-_failures (0.00s)
=== RUN   TestAnalyzeVelero_VolumeSnapshotLocations
=== RUN   TestAnalyzeVelero_VolumeSnapshotLocations/no_volume_snapshot_locations
=== RUN   TestAnalyzeVelero_VolumeSnapshotLocations/volume_snapshot_locations_configured
=== RUN   TestAnalyzeVelero_VolumeSnapshotLocations/volume_snapshot_locations_-_failures
--- PASS: TestAnalyzeVelero_VolumeSnapshotLocations (0.00s)
    --- PASS: TestAnalyzeVelero_VolumeSnapshotLocations/no_volume_snapshot_locations (0.00s)
    --- PASS: TestAnalyzeVelero_VolumeSnapshotLocations/volume_snapshot_locations_configured (0.00s)
    --- PASS: TestAnalyzeVelero_VolumeSnapshotLocations/volume_snapshot_locations_-_failures (0.00s)
=== RUN   TestAnalyzeVelero_Logs
=== RUN   TestAnalyzeVelero_Logs/no_logs
=== RUN   TestAnalyzeVelero_Logs/logs_-_no_errors_in_node-agent*_pods
=== RUN   TestAnalyzeVelero_Logs/logs_-_no_errors_in_velero*_pods
=== RUN   TestAnalyzeVelero_Logs/logs_-_errors_in_node-agent*_pods
--- PASS: TestAnalyzeVelero_Logs (0.00s)
    --- PASS: TestAnalyzeVelero_Logs/no_logs (0.00s)
    --- PASS: TestAnalyzeVelero_Logs/logs_-_no_errors_in_node-agent*_pods (0.00s)
    --- PASS: TestAnalyzeVelero_Logs/logs_-_no_errors_in_velero*_pods (0.00s)
    --- PASS: TestAnalyzeVelero_Logs/logs_-_errors_in_node-agent*_pods (0.00s)
=== RUN   TestAnalyzeVelero_Results
=== RUN   TestAnalyzeVelero_Results/no_results
=== RUN   TestAnalyzeVelero_Results/results_-_pass
=== RUN   TestAnalyzeVelero_Results/results_-_fail
--- PASS: TestAnalyzeVelero_Results (0.00s)
    --- PASS: TestAnalyzeVelero_Results/no_results (0.00s)
    --- PASS: TestAnalyzeVelero_Results/results_-_pass (0.00s)
    --- PASS: TestAnalyzeVelero_Results/results_-_fail (0.00s)
PASS
ok      github.com/replicatedhq/troubleshoot/pkg/analyze        (cached)

pkg/analyze/velero.go Outdated Show resolved Hide resolved
pkg/analyze/velero.go Outdated Show resolved Hide resolved
pkg/apis/troubleshoot/v1beta2/analyzer_shared.go Outdated Show resolved Hide resolved
banjoh
banjoh previously approved these changes Nov 3, 2023
Signed-off-by: Archit Sharma <[email protected]>
Copy link
Member

@banjoh banjoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@arcolife arcolife merged commit 7038da8 into replicatedhq:main Nov 3, 2023
23 checks passed
arcolife added a commit to replicatedhq/troubleshoot.sh that referenced this pull request Nov 3, 2023
* add velero analyzer docs 

for replicatedhq/troubleshoot/pull/1366

---------

Signed-off-by: Archit Sharma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Velero Collector and Analyzer
3 participants